Tensorflow Object Detection API Win10和Ubuntu双系统教程

本实验的Faster RCNN 源代码实现来自tensorflow/models 的 object_detection API ；在COCO2014数据集上完成训练和测试，本教程由我、胡保林大神以及韩博士在win10和Ubuntu16.04上完成并测试通过。

1. 环境和依赖准备

Tensorflow Object Detection API 根据官方安装指示安装下载安装：

最后的测试代码： python object_detection/builders/model_builder_test.py
如果返回下图结果，表示上述API安装成功、
如果安装失败，看代码报错debug.
注意：将 object_detection 和 slim 配置到环境变量中，否则会出现 没有这俩个模块的报错
win10:
	可以在上述俩个文件夹下运行： python setup.py install

下载 COCO2014数据集. 保持下列数据存储:

COCO/DIR/
  annotations/
    instances_train2014.json
    instances_val2014.json
  train2014/
    COCO_train2014_*.jpg
  val2014/
    COCO_val2014_*.jpg

2. Object Detection API部分代码用法

1. 创建COCO或者VOC格式的TFRecord文件

object detection/dataset_tools中包含创建TFRecord的python程序。以创建COCO2014为例：

python object detection/dataset_tools/create_coco_tf_record.py --logtostderr \
	  --train_image_dir="E:\TEMP_PROJECTS\\COCO\\train2014\\" \
      --val_image_dir="E:\TEMP_PROJECTS\\COCO\\val2014\\" \
      -- train_annotations_file= 			                 "E:\TEMP_PROJECTS\COCO\\annotations/instances_train2014.json"\
      --val_annotations_file="E:\TEMP_PROJECTS\COCO\\annotations/instances_val2014.json" \
      --output_dir="../data/coco"
    PS:如果不转换测试数据，请在create_coco_tf_record.py中检测assert函数将test_file相关代码注释掉

2.训练Faster RCNN Nasnet

下载预训练模型和修改config文件

1. 类别，迭代数，预训练模型路径，tfrecord的路径。（注意：如果只有11G显存，请把config中图像尺寸改成600*600以下）
2. 针对自己数据集修改 类别个数--num_classes
3. samples/configs/faster_rcnn_nas_coco.config文件里面PATH_TO_BE_CONFIGURED 共有5处应替换成对应文件：比如训练集路径，label_map路径，模型权重路径等，详见config文件
4. fine_tune_checkpoint：设置预训练模型权重路径。
5. num_steps 根据实际情况修改

输入训练命令（根目录：research文件夹）

python object_detection/legacy/train.py  --logtostderr \
        --train_dir=object_detection/data/coco/ \
        --pipeline_config_path=object_detection/samples/configs/faster_rcnn_nas_coco.config
        #train_dir---训练数据路径
        #pipeline_config_path---config 路径

模型评估（根目录同上）

1
2
3

注意：
1. legacy/evaluator.py 第58行改为 ：EVAL_DEFAULT_METRIC = 'coco_detection_metrics'
2. utils/object_detection_evaluation.py 如果用python3的话将 unicode 改成str

python object_detection/legacy/eval.py 
        --logtostderr 
        --checkpoint_dir=object_detection/faster_rcnn_nas_coco_2018_01_28 
        --eval_dir=object_detection/data/coco/ 
        --pipeline_config_path=object_detection/samples/configs/faster_rcnn_nas_coco.config 
        --run_once True

3. 在VOC上微调

构建VOC TFRcord格式数据

python object_detection/dataset_tools/create_pascal_tf_record.py \
        --data_dir=/mnt/fpan/HBL_data/data_set/VOCdevkit \
        --year=VOC2007 \
        --output_path=object_detection/data/voc/train2007.record   #是文件名

重新写一个object_detection/samples/configs/faster_rcnn_nas_voc.config 文件

在 fine_tune_checkpoint：设置为COCO训练好的模型权重路径。

测试方法同上

4. Faster RCNN Nasnet demo

1	见object_detection_tutorial.py 该文件需要在object detection目录下或者你把里面的路径设置正确


# coding: utf-8

# # Object Detection Demo
# Welcome to the object detection inference walkthrough!  This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start.

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
  raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')


# ## Env setup

# This is needed to display the images.
#get_ipython().magic('matplotlib inline')


# ## Object detection imports
# Here are the imports from the object detection module.

from utils import label_map_util

from utils import visualization_utils as vis_util


# # Model preparation 

# ## Variables
# 
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.  
# 
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

# What model to download.
MODEL_NAME = 'faster_rcnn_nas_coco_2018_01_28'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')


# ## Download Model

# opener = urllib.request.URLopener()
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
# tar_file = tarfile.open(MODEL_FILE)
# for file in tar_file.getmembers():
#  file_name = os.path.basename(file.name)
#  if 'frozen_inference_graph.pb' in file_name:
#    tar_file.extract(file, os.getcwd())

# ## Load a (frozen) Tensorflow model into memory.

detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


# # Detection

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)